Goto

Collaborating Authors

 projection image


SparseFourierBackpropagationinCryo-EM Reconstruction

Neural Information Processing Systems

Thepresence of multiple structural states in the data represents a major bottleneck in existing processing pipelines, often requiring expert user supervision.


Marker Track: Accurate Fiducial Marker Tracking for Evaluation of Residual Motions During Breath-Hold Radiotherapy

arXiv.org Artificial Intelligence

Fiducial marker positions in projection image of cone-beam computed tomography (CBCT) scans have been studied to evaluate daily residual motion during breath-hold radiation therapy. Fiducial marker migration posed challenges in accurately locating markers, prompting the development of a novel algorithm that reconstructs volumetric probability maps of marker locations from filtered gradient maps of projections. This guides the development of a Python-based algorithm to detect fiducial markers in projection images using Meta AI's Segment Anything Model 2 (SAM 2). Retrospective data from a pancreatic cancer patient with two fiducial markers were analyzed. The three-dimensional (3D) marker positions from simulation computed tomography (CT) were compared to those reconstructed from CBCT images, revealing a decrease in relative distances between markers over time. Fiducial markers were successfully detected in 2777 out of 2786 projection frames. The average standard deviation of superior-inferior (SI) marker positions was 0.56 mm per breath-hold, with differences in average SI positions between two breath-holds in the same scan reaching up to 5.2 mm, and a gap of up to 7.3 mm between the end of the first and beginning of the second breath-hold. 3D marker positions were calculated using projection positions and confirmed marker migration. This method effectively calculates marker probability volume and enables accurate fiducial marker tracking during treatment without requiring any specialized equipment, additional radiation doses, or manual initialization and labeling. It has significant potential for automatically assessing daily residual motion to adjust planning margins, functioning as an adaptive radiation therapy tool.


Position-aware Guided Point Cloud Completion with CLIP Model

arXiv.org Artificial Intelligence

Point cloud completion aims to recover partial geometric and topological shapes caused by equipment defects or limited viewpoints. Current methods either solely rely on the 3D coordinates of the point cloud to complete it or incorporate additional images with well-calibrated intrinsic parameters to guide the geometric estimation of the missing parts. Although these methods have achieved excellent performance by directly predicting the location of complete points, the extracted features lack fine-grained information regarding the location of the missing area. To address this issue, we propose a rapid and efficient method to expand an unimodal framework into a multimodal framework. This approach incorporates a position-aware module designed to enhance the spatial information of the missing parts through a weighted map learning mechanism. In addition, we establish a Point-Text-Image triplet corpus PCI-TI and MVP-TI based on the existing unimodal point cloud completion dataset and use the pre-trained vision-language model CLIP to provide richer detail information for 3D shapes, thereby enhancing performance. Extensive quantitative and qualitative experiments demonstrate that our method outperforms state-of-the-art point cloud completion methods.


Rigid Single-Slice-in-Volume registration via rotation-equivariant 2D/3D feature matching

arXiv.org Artificial Intelligence

In medical imaging, the aim is often to place a 2D image in a 3D volumetric observation to w. Current approaches for rigid single slice in volume registration are limited by requirements such as pose initialization, stacks of adjacent slices, or reliable anatomical landmarks. Here, we propose a self-supervised 2D/3D registration approach to match a single 2D slice to the corresponding 3D volume. The method works in data without anatomical priors such as images of tumors. It addresses the dimensionality disparity and establishes correspondences between 2D in-plane and 3D out-of-plane rotation-equivariant features by using group equivariant CNNs. These rotation-equivariant features are extracted from the 2D query slice and aligned with their 3D counterparts. Results demonstrate the robustness of the proposed slice-in-volume registration on the NSCLC-Radiomics CT and KIRBY21 MRI datasets, attaining an absolute median angle error of less than 2 degrees and a mean-matching feature accuracy of 89% at a tolerance of 3 pixels.


Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

arXiv.org Artificial Intelligence

4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in-situ experiments that causes spurious artifacts and inaccurate morphological reconstructions of the object. To solve this problem, we propose to perform a 4D time-space reconstruction using a distributed implicit neural representation (DINR) network that is trained using a novel distributed stochastic training algorithm. Our DINR network learns to reconstruct the object at its output by iterative optimization of its network parameters such that the measured projection images best match the output of the CT forward measurement model. We use a continuous time and space forward measurement model that is a function of the DINR outputs at a sparsely sampled set of continuous valued object coordinates. Unlike existing state-of-the-art neural representation architectures that forward and back propagate through dense voxel grids that sample the object's entire time-space coordinates, we only propagate through the DINR at a small subset of object coordinates in each iteration resulting in an order-of-magnitude reduction in memory and compute for training. DINR leverages distributed computation across several compute nodes and GPUs to produce high-fidelity 4D time-space reconstructions even for extremely large CT data sizes. We use both simulated parallel-beam and experimental cone-beam X-ray CT datasets to demonstrate the superior performance of our approach.


Localization of Pallets on Shelves Using Horizontal Plane Projection of a 360-degree Image

arXiv.org Artificial Intelligence

In this paper, we propose a method for calculating the three-dimensional (3D) position and orientation of a pallet placed on a shelf on the side of a forklift truck using a 360-degree camera. By using a 360-degree camera mounted on the forklift truck, it is possible to observe both the pallet at the side of the forklift and one several meters ahead. However, the pallet on the obtained image is observed with different distortion depending on its 3D position, so that it is difficult to extract the pallet from the image. To solve this problem, a method [1] has been proposed for detecting a pallet by projecting a 360-degree image on a vertical plane that coincides with the front of the shelf to calculate an image similar to the image seen from the front of the shelf. At the same time as the detection, the approximate position and orientation of the detected pallet can be obtained, but the accuracy is not sufficient for automatic control of the forklift truck. In this paper, we propose a method for accurately detecting the yaw angle, which is the angle of the front surface of the pallet in the horizontal plane, by projecting the 360-degree image on a horizontal plane including the boundary line of the front surface of the detected pallet. The position of the pallet is also determined by moving the vertical plane having the detected yaw angle back and forth, and finding the position at which the degree of coincidence between the projection image on the vertical plane and the actual size of the front surface of the pallet is maximized. Experiments using real images taken in a laboratory and an actual warehouse have confirmed that the proposed method can calculate the position and orientation of a pallet within a reasonable calculation time and with the accuracy necessary for inserting the fork into the hole in the front of the pallet.


Computerized Tomography and Reproducing Kernels

arXiv.org Machine Learning

The X-ray transform is one of the most fundamental integral operators in image processing and reconstruction. In this article, we revisit its mathematical formalism, and propose an innovative approach making use of Reproducing Kernel Hilbert Spaces (RKHS). Within this framework, the X-ray transform can be considered as a natural analogue of Euclidean projections. The RKHS framework considerably simplifies projection image interpolation, and leads to an analogue of the celebrated representer theorem for the problem of tomographic reconstruction. It leads to methodology that is dimension-free and stands apart from conventional filtered back-projection techniques, as it does not hinge on the Fourier transform. It also allows us to establish sharp stability results at a genuinely functional level, but in the realistic setting where the data are discrete and noisy. The RKHS framework is amenable to any reproducing kernel on a unit ball, affording a high level of generality. When the kernel is chosen to be rotation-invariant, one can obtain explicit spectral representations which elucidate the regularity structure of the associated Hilbert spaces, and one can also solve the reconstruction problem at the same computational cost as filtered back-projection.


Boosted ab initio Cryo-EM 3D Reconstruction with ACE-EM

arXiv.org Artificial Intelligence

The central problem in cryo-electron microscopy (cryo-EM) is to recover the 3D structure from noisy 2D projection images which requires estimating the missing projection angles (poses). Recent methods attempted to solve the 3D reconstruction problem with the autoencoder architecture, which suffers from the latent vector space sampling problem and frequently produces suboptimal pose inferences and inferior 3D reconstructions. Here we present an improved autoencoder architecture called ACE (Asymmetric Complementary autoEncoder), based on which we designed the ACE-EM method for cryo-EM 3D reconstructions. Compared to previous methods, ACE-EM reached higher pose space coverage within the same training time and boosted the reconstruction performance regardless of the choice of decoders. With this method, the Nyquist resolution (highest possible resolution) was reached for 3D reconstructions of both simulated and experimental cryo-EM datasets. Furthermore, ACE-EM is the only amortized inference method that reached the Nyquist resolution.


IGCN: Image-to-graph Convolutional Network for 2D/3D Deformable Registration

arXiv.org Artificial Intelligence

Organ shape reconstruction based on a single-projection image during treatment has wide clinical scope, e.g., in image-guided radiotherapy and surgical guidance. We propose an image-to-graph convolutional network that achieves deformable registration of a 3D organ mesh for a single-viewpoint 2D projection image. This framework enables simultaneous training of two types of transformation: from the 2D projection image to a displacement map, and from the sampled per-vertex feature to a 3D displacement that satisfies the geometrical constraint of the mesh structure. Assuming application to radiation therapy, the 2D/3D deformable registration performance is verified for multiple abdominal organs that have not been targeted to date, i.e., the liver, stomach, duodenum, and kidney, and for pancreatic cancer. The experimental results show shape prediction considering relationships among multiple organs can be used to predict respiratory motion and deformation from digitally reconstructed radiographs with clinically acceptable accuracy.


Noise2Inverse: Self-supervised deep convolutional denoising for linear inverse problems in imaging

arXiv.org Machine Learning

Recovering a high-quality image from noisy indirect measurement is an important problem with many applications. For such inverse problems, supervised deep convolutional neural network (CNN)-based denoising methods have shown strong results, but their success critically depends on the availability of a high-quality training dataset of similar measurements. For image denoising, methods are available that enable training without a separate training dataset by assuming that the noise in two different pixels is uncorrelated. However, this assumption does not hold for inverse problems, resulting in artifacts in the output of existing methods. Here, we propose Noise2Inverse, a deep CNN-based denoising method for linear inverse problems in imaging that does not require any additional clean or noisy data. Training a CNN-based denoiser is enabled by exploiting the noise model to compute multiple statistically independent reconstructions. We develop a theoretical framework which shows that such training indeed obtains a denoising CNN, assuming the measured noise is element-wise independent and zero-mean. On simulated CT datasets, Noise2Inverse demonstrates a substantial improvement in peak signal-to-noise ratio (> 2dB) and structural similarity index (> 30%) compared to image denoising methods and conventional reconstruction methods, such as Total-Variation Minimization. We also demonstrate that the method is able to significantly reduce noise in challenging real-world experimental datasets.